This report introduces some handy tips and links for generating an Rmarkdown report and conducts exploratory analyses.
Here is an example of conducting preliminary analyses using R and reporting the results in Rmarkdown.
Say we have some (publicly) available data, such as the famous Iris data set. To find more information about this data set do
# ?iris # (this opens in web browser)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
dim(iris) # 150 rows and 5 columns/ features
## [1] 150 5
References:
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179–188.
The data were collected by Anderson, Edgar (1935). The irises of the Gaspe Peninsula, Bulletin of the American Iris Society, 59, 2–5.
First rule of data analyses is to visualise the data prior to running models or analyses. Data visualisations provide a quick and effective way to see and understand trends, outliers and patterns in the data.
Is there a possible association between orchid petals and sepal features?
Here are two ways to look at correlations. The first is a static plot, hover the cursor over the second plot to read the pairwise Pearson correlations.
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 1 -0.1175698 0.8717538 0.8179411
## Sepal.Width NA 1.0000000 -0.4284401 -0.3661259
## Petal.Length NA NA 1.0000000 0.9628654
## Petal.Width NA NA NA 1.0000000
Code for above plots can be found here and here.
There is a strong positive correlation between petal and sepal lengths. Let’s look at their scatter plot. Hover the cursor over the individual points to get point data. Click on a legend group to include/ exclude that group.
As sepal length increases, so does petal length and vice versa. It appears that they are also clustered into orchid groups. Setosa is out on its own, whereas Versicolor and Virginica seem to have less separation between them.
Now that we have established there is an association between sepal and petal length, let’s model it. A simple linear model was fitted, where the sepal length (outcome) is dependent on the petal length (covariate); for now we will not consider the three species clusters, and model the data set as a whole.
| Coefficient | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 4.30 | 0.078 | 55 | 0 |
| Petal.Length | 0.41 | 0.019 | 22 | 0 |
For every one unit increase in petal length, sepal length increases by 0.41 cm.